Implementing Custom Metrics in ValidMind

ValidMind strives to offer a comprehensive set of metrics out of the box to help you evaluate and document your models and datasets. However, we understand that there will always be cases where a model or dataset is not supported or where you need to document specific metrics that are not part of the default set. In these cases, you will want to create and use your own code to accomplish what you need. We aim to make the process of using custom code as seamless as possible. To this end, we offer support for custom metric functions. Custom metrics offer added flexibility by extending the default metrics provided by ValidMind, enabling you to document any type of model or use case. In this notebook, we will demonstrate how to implement custom metrics, register them with ValidMind, run them individually and see the result in the ValidMind platform, and, finally, add them as part of your model documentation template.

Prerequisites

We assume that you are familiar with Python and have a basic understanding of defining functions and using decorators. If you are new to these concepts, we recommend that you familiarize yourself with them before proceeding.

Key Concepts

  • Documentation Templates: Documentation templates are used to define the structure of your model documentation. They specify the tests that should be run, and how the results should be displayed. In the context of this tutorial, you will not need to know how templates work, merely how to add custom metrics to them via the ValidMind Platform.
  • Tests: Tests are the building blocks of ValidMind. They are used to evaluate and document models and datasets. Tests can be run individually or as part of a suite that is defined by your model documentation template.
  • Metrics: Metrics are a subset of tests that do not have thresholds. In the context of this notebook, you can think of metrics and tests as interchangeable concepts.
  • Custom Metrics: Custom metrics are functions that you define to evaluate your model or dataset. These functions can be registered with ValidMind to be used in the platform.
  • Inputs: In the ValidMind framework, inputs are objects to be evaluated and documented. They can be any of the following:
    • model: A single model that has been initialized in ValidMind with vm.init_model(). See the Model Documentation or the for more information.
    • dataset: Single dataset that has been initialized in ValidMind with vm.init_dataset(). See the Dataset Documentation for more information.
    • models: A list of ValidMind models - usually this is used when you want to compare multiple models in your custom metric.
    • datasets: A list of ValidMind datasets - usually this is used when you want to compare multiple datasets in your custom metric. See this example for more information.
  • Parameters: Parameters are additional arguments that can be passed when running a ValidMind test. These can be used to pass additional information to a metric, customize its behavior, or provide additional context.
  • Outputs: Custom metrics can return any number of the following elements (in any order):
    • table: Either a list of dictionaries where each dictionary represents a row in the table, or a pandas DataFrame.
    • plot: A matplotlib or plotly figure.

Custom Metric Overview

A custom metric is any function that takes as arguments a set of inputs and optionally some parameters and returns one or more outputs. That’s it! The function can be as simple or as complex as you need it to be. It can use external libraries, make API calls, or do anything else that you can do in Python. The only requirement is that the function signature and return values can be “understood” and handled by the ValidMind developer framework.

Now that you are familiar with what custom metrics are and the key concepts involved in creating and using them, let’s dive into some hands-on examples!

Before you begin

New to ValidMind?

To access the ValidMind Platform UI, you’ll need an account.

Signing up is FREE — Create your account.

If you encounter errors due to missing modules in your Python environment, install the modules with pip install, and then re-run the notebook. For more help, refer to Installing Python Modules.

Install the client library

%pip install -q validmind
WARNING: You are using pip version 22.0.3; however, version 24.0 is available.
You should consider upgrading via the '/Users/andres/code/validmind-sdk/.venv/bin/python3 -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.

Initialize the client library

ValidMind generates a unique code snippet for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

  1. In a browser, log into the Platform UI.

  2. In the left sidebar, navigate to Model Inventory and click + Register new model.

  3. Enter the model details, making sure to select Binary classification as the template and Marketing/Sales - Attrition/Churn Management as the use case, and click Continue. (Need more help?)

  4. Go to Getting Started and click Copy snippet to clipboard.

Next, replace this placeholder with your own code snippet:

# Replace with your code snippet

import validmind as vm

vm.init(
  api_host = "...",
  api_key = "...",
  api_secret = "...",
  project = "..."
)
2024-04-10 17:22:57,253 - INFO(validmind.api_client): Connected to ValidMind. Project: [Int. Tests] Customer Churn - Initial Validation (cltnl29bz00051omgwepjgu1r)

Implement a Custom Metric

Let’s start off by creating a simple custom metric that creates a Confusion Matrix for a binary classification model. We will use the sklearn.metrics.confusion_matrix function to calculate the confusion matrix and then display it as a heatmap using plotly. (This is already a built-in metric in ValidMind, but we will use it as an example to demonstrate how to create custom metrics.)

import matplotlib.pyplot as plt
from sklearn import metrics


@vm.metric("my_custom_metrics.ConfusionMatrix")
def confusion_matrix(dataset, model):
    """The confusion matrix is a table that is often used to describe the performance of a classification model on a set of data for which the true values are known.

    The confusion matrix is a 2x2 table that contains 4 values:

    - True Positive (TP): the number of correct positive predictions
    - True Negative (TN): the number of correct negative predictions
    - False Positive (FP): the number of incorrect positive predictions
    - False Negative (FN): the number of incorrect negative predictions

    The confusion matrix can be used to assess the holistic performance of a classification model by showing the accuracy, precision, recall, and F1 score of the model on a single figure.
    """
    y_true = dataset.y
    y_pred = dataset.y_pred(model_id=model.input_id)

    confusion_matrix = metrics.confusion_matrix(y_true, y_pred)

    cm_display = metrics.ConfusionMatrixDisplay(
        confusion_matrix=confusion_matrix,
        display_labels=[False, True]
    )
    cm_display.plot()

    plt.close()  # close the plot to avoid displaying it
    
    return cm_display.figure_  # return the figure object itself

Thats our custom metric defined and ready to go… Let’s take a look at whats going on here:

  • The function confusion_matrix takes two arguments dataset and model. This is a VMDataset and VMModel object respectively.
  • The function docstring provides a description of what the metric does. This will be displayed along with the result in this notebook as well as in the ValidMind platform.
  • The function body calculates the confusion matrix using the sklearn.metrics.confusion_matrix function and then plots it using sklearn.metric.ConfusionMatrixDisplay.
  • The function then returns the ConfusionMatrixDisplay.figure_ object - this is important as the ValidMind framework expects the output of the custom metric to be a plot or a table.
  • The @vm.metric decorator is doing the work of creating a wrapper around the function that will allow it to be run by the ValidMind framework. It also registers the metric so it can be found by the ID my_custom_metrics.ConfusionMatrix (see the section below on how test IDs work in ValidMind and why this format is important)

Run the Custom Metric

Now that we have defined and registered our custom metric, lets see how we can run it and properly use it in the ValidMind platform.

Setup the Model and Dataset

First let’s setup a an example model and dataset to run our custom metic against. Since this is a Confusion Matrix, we will use the Customer Churn dataset that ValidMind provides and train a simple XGBoost model.

import xgboost as xgb
from validmind.datasets.classification import customer_churn

raw_df = customer_churn.load_data()
train_df, validation_df, test_df = customer_churn.preprocess(raw_df)

x_train = train_df.drop(customer_churn.target_column, axis=1)
y_train = train_df[customer_churn.target_column]
x_val = validation_df.drop(customer_churn.target_column, axis=1)
y_val = validation_df[customer_churn.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)
XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, early_stopping_rounds=10,
              enable_categorical=False, eval_metric=['error', 'logloss', 'auc'],
              feature_types=None, gamma=None, gpu_id=None, grow_policy=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_bin=None, max_cat_threshold=None,
              max_cat_to_onehot=None, max_delta_step=None, max_depth=None,
              max_leaves=None, min_child_weight=None, missing=nan,
              monotone_constraints=None, n_estimators=100, n_jobs=None,
              num_parallel_tree=None, predictor=None, random_state=None, ...)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Easy enough! Now we have a model and dataset setup and trained. One last thing to do is bring the dataset and model into the ValidMind framework:

# for now, we'll just use the test dataset
vm_test_ds = vm.init_dataset(
    dataset=test_df,
    target_column=customer_churn.target_column,
    input_id="test_dataset",
)

vm_model = vm.init_model(model, input_id="model")

# link the model to the dataset
vm_test_ds.assign_predictions(model=vm_model)
2024-04-10 17:22:57,424 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-04-10 17:22:58,274 - INFO(validmind.vm_models.dataset): Running predict()... This may take a while

Run the Custom Metric

Now that we have our model and dataset setup, we have everything we need to run our custom metric. We can do this by importing the run_test function from the validmind.tests module and passing in the test ID of our custom metric along with the model and dataset we want to run it against.

Notice how the inputs dictionary is used to map an input_id which we set above to the model and dataset keys that are expected by our custom metric function. This is how the ValidMind framework knows which inputs to pass to different metrics and is key when using many different datasets and models.

from validmind.tests import run_test

result = run_test("my_custom_metrics.ConfusionMatrix", inputs={"model": "model", "dataset": "test_dataset"})

You’ll notice that the docstring becomes a markdown description of the test. The figure is then displayed as the test result. What you see above is how it will look in the ValidMind platform as well. Let’s go ahead and log the result to see how that works.

result.log()

Adding Custom Metrics to Model Documentation

To do this, go to the documentation page of the model you registered above and navigate to the Model Development -> Model Evaluation section. Then hover between any existing content block to reveal the + button as shown in the screenshot below.

screenshot showing insert button for test-driven blocks

Now click on the + button and select the Test-Driven Block option. This will open a dialog where you can select Metric as the type of test and the My Custom Metrics Confusion Matrix from the list of available metrics. You can preview the result and then click Insert Block to add it to the documentation.

screenshot showing how to insert a test-driven block

The test should match the result you see above. It is now part of your documentation and will now be run everytime you run vm.run_documentation_tests() for your model. Let’s do that now.

vm.reload()
2024-04-10 17:22:59,057 - INFO(validmind.api_client): Connected to ValidMind. Project: [Int. Tests] Customer Churn - Initial Validation (cltnl29bz00051omgwepjgu1r)

If you preview the template, it should show the custom metric in the Model Development->Model Evaluation section:

vm.preview_template()

Just so we can run all of the tests in the template, let’s initialize the train and raw dataset.

(see the quickstart_customer_churn_full_suite.ipynb notebook and the ValidMind docs for more information on what we are doing here)

vm_raw_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column=customer_churn.target_column,
    class_labels=customer_churn.class_labels,
)

vm_train_ds = vm.init_dataset(
    dataset=train_df,
    input_id="train_dataset",
    target_column=customer_churn.target_column,
)
vm_train_ds.assign_predictions(model=vm_model)
2024-04-10 17:22:59,823 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-04-10 17:22:59,878 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-04-10 17:22:59,927 - INFO(validmind.vm_models.dataset): Running predict()... This may take a while

To run all the tests in the template, you can use the vm.run_documentation_tests() and pass the inputs we initialized above and the demo config from our customer_churn module. We will have to add a section to the config for our new test to tell it which inputs it should receive. This is done by simply adding a new element in the config dictionary where the key is the ID of the test and the value is a dictionary with the following structure:

{
    "inputs": {
        "model": "test_dataset",
        "dataset": "model",
    }
}
from validmind.utils import preview_test_config

test_config = customer_churn.get_demo_test_config()
test_config["my_custom_metrics.ConfusionMatrix"] = {
    "inputs": {
        "dataset": "test_dataset",
        "model": "model",
    }
}
preview_test_config(test_config)
full_suite = vm.run_documentation_tests(config=test_config)
2024-04-10 17:22:59,938 - WARNING(validmind.vm_models.test_suite.runner): Config key 'my_custom_metrics.ConfusionMatrix' does not match a test_id in the template.
    Ensure you registered a content block with the correct content_id in the template
    The configuration for this test will be ignored.

Some More Custom Metrics

Now that you understand the entire process of creating custom metrics and using them in your documentation, let’s create a few more to see different ways you can utilize custom metrics.

Custom Metric: Table of Model Hyperparameters

This custom metric will display a table of the hyperparameters used in the model:

@vm.metric("my_custom_metrics.Hyperparameters")
def hyperparameters(model):
    """The hyperparameters of a machine learning model are the settings that control the learning process.
    These settings are specified before the learning process begins and can have a significant impact on the
    performance of the model.

    The hyperparameters of a model can be used to tune the model to achieve the best possible performance
    on a given dataset. By examining the hyperparameters of a model, you can gain insight into how the model
    was trained and how it might be improved.
    """
    hyperparameters = model.model.get_xgb_params() # dictionary of hyperparameters

    # turn the dictionary into a table where each row contains a hyperparameter and its value
    return [{"Hyperparam": k, "Value": v} for k, v in hyperparameters.items() if v]


result = run_test("my_custom_metrics.Hyperparameters", inputs={"model": "model"})
result.log()

Since the metric has been run and logged, you can add it to your documentation using the same process as above. It should look like this:

screenshot showing hyperparameters metric

For our simple toy model, there are aren’t really any proper hyperparameters but you can see how this could be useful for more complex models that have gone through hyperparameter tuning.

Custom Metric: External API Call

This custom metric will make an external API call to get the current BTC price and display it as a table. This demonstrates how you might integrate external data sources into your model documentation in a programmatic way. You could, for instance, setup a pipeline that runs a metric like this every day to keep your model documentation in sync with an external system.

import requests


@vm.metric("my_custom_metrics.ExternalAPI")
def external_api():
    """This metric calls an external API to get the current BTC price. It then creates
    a table with the relevant data so it can be displayed in the documentation.

    The purpose of this metric is to demonstrate how to call an external API and use the
    data in a metric. A metric like this could even be setup to run in a scheduled
    pipeline to keep your documentation in-sync with an external data source.
    """
    url = "https://api.coindesk.com/v1/bpi/currentprice.json"
    response = requests.get(url)
    data = response.json()

    # extract the time and the current BTC price in USD
    return [
        {
            "Time": data["time"]["updated"],
            "Price (USD)": data["bpi"]["USD"]["rate"],
        }
    ]


result = run_test("my_custom_metrics.ExternalAPI")
result.log()

Again, you can add this to your documentation to see how it looks:

screenshot showing BTC price metric

Custom Metric: Passing Parameters

Custom metric functions, as stated earlier, can take both inputs and params. When you define your function there is no need to distinguish between the two, the ValidMind framework will handle that for you. You simply need to add both to the function as arguments and the framework will pass in the correct values.

So for instance, if you wanted to parameterize the first custom metric we created, the confusion matrix, you could do so like this:

def confusion_matrix(dataset: VMDataset, model: VMModel, my_param: str = "Default Value"):
    pass

And then when you run the test, you can pass in the parameter like this:

vm.run_test(
    "my_custom_metrics.ConfusionMatrix",
    inputs={"model": "model", "dataset": "test_dataset"},
    params={"my_param": "My Value"},
)

Or if you are running the entire documentation template, you would update the config like this:

test_config["my_custom_metrics.ConfusionMatrix"] = {
    "inputs": {
        "dataset": "test_dataset",
        "model": "model",
    },
    "params": {
        "my_param": "My Value",
    },
}

Let’s go ahead and create a toy metric that takes a parameter and uses it in the result:

import plotly_express as px


@vm.metric("my_custom_metrics.ParameterExample")
def parameter_example(plot_title = "Default Plot Title", x_col="sepal_width", y_col="sepal_length"):
    """This metric takes two parameters and creates a scatter plot based on them.

    The purpose of this metric is to demonstrate how to create a metric that takes
    parameters and uses them to generate a plot. This can be useful for creating
    metrics that are more flexible and can be used in a variety of scenarios.
    """
    # return px.scatter(px.data.iris(), x=x_col, y=y_col, color="species")
    return px.scatter(px.data.iris(), x=x_col, y=y_col, color="species", title=plot_title)


result = run_test(
    "my_custom_metrics.ParameterExample",
    params={
        "plot_title": "My Cool Plot",
        "x_col": "sepal_width",
        "y_col": "sepal_length",
    },
)
result.log()

Play around with this and see how you can use parameters, default values and other features to make your custom metrics more flexible and useful.

Here’s how this one looks in the documentation: screenshot showing parameterized metric

Custom Metric: Multiple Tables and Plots in a Single Metric

Custom metric functions, as stated earlier, can return more than just one table or plot. In fact, any number of tables and plots can be returned. Let’s see an example of this:

import numpy as np
import plotly_express as px

@vm.metric("my_custom_metrics.ComplexOutput")
def complex_output():
    """This metric demonstrates how to return many tables and figures in a single metric"""
    # create a couple tables
    table = [{"A": 1, "B": 2}, {"A": 3, "B": 4}]
    table2 = [{"C": 5, "D": 6}, {"C": 7, "D": 8}]

    # create a few figures showing some random data
    fig1 = px.line(x=np.arange(10), y=np.random.rand(10), title="Random Line Plot")
    fig2 = px.bar(x=["A", "B", "C"], y=np.random.rand(3), title="Random Bar Plot")
    fig3 = px.scatter(x=np.random.rand(10), y=np.random.rand(10), title="Random Scatter Plot")

    return {
        "My Cool Table": table,
        "Another Table": table2,
    }, fig1, fig2, fig3


result = run_test("my_custom_metrics.ComplexOutput")
result.log()

Notice how you can return the tables as a dictionary where the key is the title of the table and the value is the table itself. You could also just return the tables by themselves but this way you can give them a title to more easily identify them in the result.

screenshot showing multiple tables and plots

Custom Metric: Images

If you are using a plotting library that isn’t supported by ValidMind (i.e. not matplotlib or plotly), you can still return the image directly as a bytes-like object. This could also be used to bring any type of image into your documentation in a programmatic way. For instance, you may want to include a diagram of your model architecture or a screenshot of a dashboard that your model is integrated with. As long as you can produce the image with Python or open it from a file, you can include it in your documentation.

import io
import matplotlib.pyplot as plt


@vm.metric("my_custom_metrics.Image")
def image():
    """This metric demonstrates how to return an image in a metric"""

    # create a simple plot
    fig, ax = plt.subplots()
    ax.plot([1, 2, 3, 4])
    ax.set_title("Simple Line Plot")

    # save the plot as a PNG image (in-memory buffer)
    img_data = io.BytesIO()
    fig.savefig(img_data, format="png")
    img_data.seek(0)

    plt.close()  # close the plot to avoid displaying it

    return img_data.read()


result = run_test("my_custom_metrics.Image")
result.log()

Adding this custom metric to your documentation will display the image:

screenshot showing image custom metric

Conclusion

In this notebook, we have demonstrated how to create custom metrics in ValidMind. We have shown how to define custom metric functions, register them with the ValidMind framework, run them against models and datasets, and add them to model documentation templates. We have also shown how to return tables and plots from custom metrics and how to use them in the ValidMind platform. We hope this tutorial has been helpful in understanding how to create and use custom metrics in ValidMind.

As always, to learn more about ValidMind, please visit our documentation.